Backend / Performance / Memory Allocation Discipline Example

Memory Allocation Discipline Example

6 min read

Rapid overview

🧪 1️⃣ BenchmarkDotNet — Measuring Allocation Discipline
📄 `TickParsingBenchmarks.cs`
⚙️ Run it:
🧾 Expected results (typical output):
💹 2️⃣ Realistic Tick Processor Example
📄 `TickProcessor.cs`
💡 Key improvements explained
🧩 Memory profile
🧠 Discussion points for your interview
✅ Pro tip
Questions & Answers

🧪 1️⃣ BenchmarkDotNet — Measuring Allocation Discipline

This microbenchmark compares two implementations of tick parsing:

Naive: uses string.Split() and double.Parse()
Optimized: uses Span<byte> + Utf8Parser (zero allocations)

📄 `TickParsingBenchmarks.cs`

using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser] // shows allocations in bytes per operation
public class TickParsingBenchmarks
{
    private readonly string tickLine = "EURUSD,1.07432,1.07436";

    [Benchmark(Baseline = true)]
    public (string, double, double) NaiveParse()
    {
        var parts = tickLine.Split(',');
        var symbol = parts[0];
        var bid = double.Parse(parts[1]);
        var ask = double.Parse(parts[2]);
        return (symbol, bid, ask);
    }

    [Benchmark]
    public (string, double, double) SpanParse()
    {
        ReadOnlySpan<byte> span = Encoding.ASCII.GetBytes(tickLine);

        int firstComma = span.IndexOf((byte)',');
        int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;

        string symbol = Encoding.ASCII.GetString(span[..firstComma]);
        Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
        Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);

        return (symbol, bid, ask);
    }

    public static void Main() => BenchmarkRunner.Run<TickParsingBenchmarks>();
}

⚙️ Run it:

dotnet add package BenchmarkDotNet
dotnet run -c Release

🧾 Expected results (typical output):

|    Method |       Mean |   Allocated |
|----------- |-----------:|------------:|
| NaiveParse |   1.200 μs |     1.24 KB |
| SpanParse  |   0.245 μs |       32 B  |

💡 Interpretation:

The optimized version is ~5× faster.
It reduces allocations from ~1.2 KB → ~32 bytes per tick.
Over 1M ticks/sec, that’s ~1.2 GB less allocation per second 🤯 — huge difference for a trading backend.

💹 2️⃣ Realistic Tick Processor Example

Now let’s build a GC-efficient Tick parser — something you can confidently mention if they ask, “How would you design a real-time price feed handler?”

📄 `TickProcessor.cs`

using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;

public readonly struct Tick
{
    public string Symbol { get; }
    public double Bid { get; }
    public double Ask { get; }

    public Tick(string symbol, double bid, double ask)
    {
        Symbol = symbol;
        Bid = bid;
        Ask = ask;
    }

    public override string ToString() => $"{Symbol}: {Bid:F5}/{Ask:F5}";
}

public class TickProcessor
{
    private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;

    public void ProcessBatch(string[] rawTicks)
    {
        foreach (var tickStr in rawTicks)
        {
            // Rent a buffer (to avoid allocating new byte[] each time)
            var buffer = _bufferPool.Rent(256);
            try
            {
                int bytesWritten = Encoding.ASCII.GetBytes(tickStr, buffer);
                var span = new ReadOnlySpan<byte>(buffer, 0, bytesWritten);

                var tick = ParseTick(span);
                OnTick(tick);
            }
            finally
            {
                _bufferPool.Return(buffer);
            }
        }
    }

    private static Tick ParseTick(ReadOnlySpan<byte> span)
    {
        // EURUSD,1.07432,1.07436
        int firstComma = span.IndexOf((byte)',');
        int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;

        string symbol = Encoding.ASCII.GetString(span[..firstComma]);
        Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
        Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);

        return new Tick(symbol, bid, ask);
    }

    private void OnTick(in Tick tick)
    {
        // Simulate publishing or processing the tick
        Console.WriteLine(tick);
    }
}

public static class Program
{
    public static void Main()
    {
        var ticks = new[]
        {
            "EURUSD,1.07432,1.07436",
            "GBPUSD,1.24587,1.24592",
            "USDJPY,151.229,151.238",
        };

        var processor = new TickProcessor();
        processor.ProcessBatch(ticks);
    }
}

💡 Key improvements explained

Improvement	Why it matters
`ArrayPool<byte>.Shared`	Reuses buffers, avoids LOH churn
`ReadOnlySpan<byte>`	Zero-copy slicing of incoming data
`Utf8Parser`	Parses numeric values directly from bytes (no string allocations)
`readonly struct Tick`	Stack-friendly immutable type, no GC tracking
`in Tick` (if used)	Passes struct by ref → no copying

🧩 Memory profile

✅ Only one small string allocation per tick (Symbol)
✅ No arrays or temporary strings per line
✅ All other memory reused via pool
✅ Negligible GC activity — steady-state latency

🧠 Discussion points for your interview

When asked “How do you ensure your system stays fast under high load?” — say:

“I design for allocation discipline — especially in tight loops. For example, in our tick processor, we rent buffers from ArrayPool<T>, parse with Span<byte> and Utf8Parser to avoid string and array allocations, and use small readonly structs for data. That keeps all transient data in Gen 0 and prevents Gen 2 pressure or LOH fragmentation. In load tests, we confirmed negligible GC activity and stable latency even at millions of ticks per second.”

✅ Pro tip

You can mention:

“In production, I monitor dotnet-counters — if Gen 2 GC Count increases, that’s a red flag that something’s allocating too much. Then I use dotnet-trace or dotMemory to find the source.”

Would you like me to extend this by showing the async version — i.e., reading ticks from a NetworkStream using System.IO.Pipelines (zero-copy streaming, ideal for high-throughput trading systems)? That’s exactly the kind of system might ask you to describe.

Questions & Answers

Q: What does the benchmark prove when comparing Split vs Span parsing?

A: It shows the optimized implementation is faster and uses dramatically fewer allocations (tens of bytes vs kilobytes per tick). That reduction scales to gigabytes saved per second in production.

Q: Why is Utf8Parser preferred over double.Parse here?

A: Utf8Parser operates directly on byte spans, avoiding string allocations and culture-dependent parsing. It’s ideal for fixed-format protocols and keeps parsing allocation-free.

Q: How does renting buffers from ArrayPool<byte> help batch processing?

A: Each tick lines uses the same reusable buffer instead of creating a new byte array. Returning the buffer keeps the LOH clean and ensures steady-state memory usage regardless of batch size.

Q: Why make Tick a readonly struct?

A: It keeps the data inline, prevents accidental mutation, and avoids heap allocations when passing ticks around. Combined with in Tick parameters, we avoid copies even for frequent calls.

Q: What’s the benefit of in Tick on the OnTick method?

A: It passes the struct by readonly reference, eliminating defensive copies for large structs and preserving immutability guarantees without GC cost.

Q: How would you extend this pattern for multi-threaded processing?

A: Use channels or System.Threading.Channels to fan out parsed ticks, but keep parsed structs allocation-free. Each consumer should reuse buffers or work with spans until serialization boundaries.

Q: How do you verify there are no hidden allocations?

A: Run the benchmark with MemoryDiagnoser, inspect ETW events, or instrument code with GC.GetAllocatedBytesForCurrentThread() to ensure the optimized method stays within expected allocation budgets.

Q: What happens if you forget to return buffers to the pool?

A: The pool will grow and eventually allocate new arrays, defeating the purpose and potentially causing memory leaks. Always return inside finally blocks to ensure deterministic cleanup.

Q: How can you adapt this sample for binary protocols?

A: Replace ASCII parsing with direct span slicing over binary fields, using BinaryPrimitives or custom parsing logic; the same pooling and span principles apply.

Q: How do you integrate this with logging or metrics without reintroducing allocations?

A: Emit structured logs with message templates, avoid string concatenation, and aggregate metrics using counters/gauges. When necessary, log summaries rather than per-tick details to keep the hot path clean.